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but  the  box  is  used  since  it  is  the  only  commonly  used  constraint 
set  for  this  problem. 
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Abstract 


Generally,  when  stochastic  approximation  is  used  to 
identify  the  coefficients  of  a  linear  system  or  for  an  adaptive 
filter  or  equalizer,  the  iterate  Xn  is  projected  back  onto  some 
finite  set  g  =  (x:|x.|  <  R,  all  i),  if  it  ever  leaves  it.  The 
convergence  of  such  truncated  sequences  have  been  discussed  in¬ 
formally.  Here  it  is  shown,  under  very  broad  conditions  on  the 
noises,  that  {Xnl  converges  w.p.l.  to  the  closest  point  in  G 
to  the  optimum  value  of  Xn>  Also,  under  even  weaker  conditions, 
the  case  of  constant  coefficient  sequence  is  treated,  and  a  weak 
convergence  result  obtained.  The  set  G  is  used  for  simplicity. 
It  can  be  seen  that  the  result  holds  true  in  more  general  cases 
but  the  box  is  used  since  it  is  the  only  commonly  used  constraint 
set  for  this  problem. 
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Reference  m»  deals  with  a  great  variety  of  stochastic 
approximation  procedures,  for  constrained  and  unconstrained 
systems  and  for  convergence  w.p.l.  and  weak  convergence,  all  for 
systems  with  correlated  inputs.  The  techniques  of  [1 ]  are  readily 
usable  for  many  problems  that  are  not  explicitly  treated  there. 
This  will  be  illustrated  here  for  one  particular  class  of  con¬ 
strained  problems  which  is  of  great  current  interest  and  which 
arises  in  identification  and  in  adaptive  control  theory.  In  fact, 
it  is  just  such  constrained  problems  to  which  more  attention 
should  be  given,  owing  to  their  prevalence.  The  proofs  a'Ve  con- 
tained  in  various  parts  ^ij^-gnd,  here,  after  the  problem  is 
defined,  it  is  shown  how  to  put  the  bits  and  pieces  together.  The 
The  problem  and  method  are  typical  of  a  large  class  of  adaptive 
systems  which  can  lie  treated  by  similar  methods,  and  is  worth 
s  ingl ing  out .  r 


The  problem  will  be  set  up  in  such  a  way  that  it  fits  both 

a  standard  identification  problem  and  a  standard  problem  in 

adaptive  equalizers.  Let  {dn>  denote  a  scalar  valued  desired 

output  sequence,  perhaps  a  training  or  reference  signal,  or  output 

of  the  system  to  be  identified.  The  problem  can  readily  be  set  up 

so  that  all  quantities  ^n»un*xni*  pn^  are  complex  valued,  but 

in  the  interest  of  simplicity,  we  suppose  that  they  are  real 

valued.  Le'  {u  }  denote  an  input  sequence  set  ip  =  (u  . u 

n  n  n’  ’  n-r+1  ’ 

and  let  { pn }  be  a  noise  sequence,  independent  of  (un).  The  ob¬ 
served  adaptive  system  output  at  time  n  is  defined  by 


2 


r-1 

£  xniun.j  *  yn,  and  the  'perturbed'  observed  reference  at  time  n  is 
i =o 

dn  +  pn>  The  idea  in  [2],  [3],  [4]  and  in  many  other  papers  is 
to  adjust  the  system  parameter  =  Cxno»***»xn  r-i)'  so  that 
the  output  (yn>  'best  matches'  the  (dn)  in  a  mean  square  sense. 

A  common  recursive  adaptive  algorithm  for  doing  this  is 


C  . i  =  X  -  a  ip  e  ,  e  =  (y  -d  -p  ) 
n+1  n  nrn  n*  n  l7n  n  nJ 


(1.1) 


=  X  -  a  X  -d  -p_)  ,  a„  -*•  0 ,  l  a  =  ®,  a  >  0. 

n  n  nv  n  n  n  n' ’  n  ’  “  n  n 

n 


Algorithm  (1.1)  has  been  the  focus  of  an  enormous  amount 

of  effort.  In  practice,  there  is  usually  given  a  bound  B  such 

that  if  some  |X  .  I  >  B,  then  X  .  is  immediately  reset  to  the 

closest  value  +B  or  -B.  This  projected  version  has  received 

little  attention.  Ljung  [2]  discusses  it,  but  deals  with  it  only 

when  the  optimum  value  of  Xn  is  strictly  inside  the  box 

G  =  { x : f x ^ |  <  B^} .  The  methods  of  [1]  can  readily  handle  such 

problems  whether  or  not  the  unconstrained  optimum  is  in  G. 

Assumptions  are  stated  in  Section  2.  These  are  of  the  type  used 

in  [1]  and  are  quite  unrestrictive .  In  Section  3,  it  is  shown 

that  (Xn>  converges  w.p.l  (under  assumptions  in  Section  2)  to 
« 

the  point  in  G  which  is  closest  to  the  optimum  value.  Incidentally, 
if  the  optimiyn  is  strictly  interior  to  G,  then  the  rate  of  con¬ 
vergence  results  in  [5]  hold.  Section  4  deals  with  a  formulation  ' 


•*  t 

'\ 

in 


•<siv 
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where  an  =  6  >  0 ,  a  constant ,  and  discusses  some  limit  results 
of  a  'weak  convergence '  nature,  also  using  techniques  from  [1]. 

In  many  of  the  proofs  in  (1],  it  is  assumed  that  the 
iterate  sequence  (X  1  is  hounded  in  some  sense.  Owing  to  the 
possible  use  of  the  projection  algorithm  (as  in  this  paper),  this 
boundedness  assumption  is  hardly  a  restriction.  This  is  one  of 
the  secondary  points  of  this  paper. 


Assumptions  for  w . p . 1 .  Convergence . 

Define  m(t)  =  max(n;t  <  t),  t  >  0, 


n-1 

where  t  =*  l  a. 

n  i-0  1 


=  0 


,  t  <  0. 


A1 .  There  is  a  positive  definite  symmetric  matrix  R  such 
that  for  each  e  >  0  and  some  T  <  «• 


m ( JT+t ) - 1  , 

(2.1)  lim  Pfsup  max|  l  a .  (ij> .  ij» . -R)  |  1  c)  *0 

n^"’  j  >n  t<T  i=m(jT)  1 


A2.  There  is  a  vector  S  such  that  (2.1)  holds  with 


(<l<nd  -S)  replacing  (<J>j<I'j-R)  . 


A3.  (2.1)  holds  with  ^nPn  replacing  (i|^(J> ^ -R) . 

Note  that  (Al)  -  (A3)  imply  that 


an ^ ^i^i ^  +  l*nPnll  *  0  "-P-1  51  n  -*•  ». 


If* 
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m(t*s)  , 

Also,  |  l  a.  -R)  |  ->0  as  t  ■*  <*,  uniformly  on  bounded 

i-m(t)  1  1  1 

s-intervals  w.p.l,  and  similarly  for  the  cases  of  (A2),  (A3). 

(See  [1,  Lemma  2.2.1]for  the  proof.) 

Assumptions  (Al)  -  (A3)  are  quite  unrestrict ive ,  [1,  p. 30*38] 
gives  several  ways  of  readily  verifying  them,  and  they 
hold  in  practical  cases  where  (at  least)  the  sequences  ^un,pn,c*n^ 
are  stationary.  The  reader  is  referred  to  the  cited  reference 
for  more  detail.  We  note  only  that  the  criteria  for  (Al)  -  (A3) 
in  [11  can  he  weakened  even  further  by  a  finer  use  of  laws  of 
large  numbers  and  estimates  of  the  type  [1,(2. 2. 8)]  which  give 

i  2  * 

bounds  on  E  max  | }  £ . |  in  terms  of  bounds  on  the  correlation 
n<i<N  n  -1 

I 

function  of  (sn).  (Al)  -  (A3)  hold  when  (i^iji^ -R) ,  etc.,  obey 

certain  laws  of  large  numbers  or  when  their  covariances  go  to  zero 

sufficiently  fast.  E.g.,  if  the  processes  are  stationary  and  the 

2 

covariances  arc  summable,  and  £(a^log2i)  <  “,  then  they  hold. 

( 1 ,  Theorem  2.2.2]  . 


3.  Convergence  w.p.l. 

Tor  any  x,  let  n^x)  denote  the  nearest  point  on  G  to  x, 
Then  the  truncated  or  projected  form  of  (1.1)  is 


(3.1) 


x„*i  "  *  -  a„t|»  (<|>  X  -d  -pj  =  X„  -  h.  (X  ,r  ) , 

n*l  n  nrn  rn  n  n  Kn'  n  1  n,sn' 


!n»l  "  ”c(xn»l' ’ 


Cn  '  'WOn1' 


where 
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Define  h(x)  =  -Rx  ♦  S.  and  0  =  R  lS.  Then  h(x)  =  -R(x-O). 
Define  the  projection  n(h(x))  of  the  vector  field  given 
by  h(x)  onto  G  by 

it(h(x))  =  lim  [  Tir  ( x*  Ah  (x) ) -x  ]  /  A. 

0  <  A-H) 

Then  x  s  ir(h(x))  is  the  'constrained'  or  'projected'  flow 
corresponding  to  x  =  h(x).  Basically,  the  limits  of  (Xn>  are 

those  of  the  solution  to  x  =  i(h(x)),  which  in  this  case  is  the 

* 

nearest  point  on  G  to  B. 

Theorem  1.  Assume  (A1)-(A3).  Then  {Xn }  given  by  (3.1), 
converges  w.p.l  to  the  c  1  osest  point  n^fP)  in  G  tji  0,  the 
optimal  value. 

Proof.  The  various  parts  of  the  proof  appear  in  (1,  Theorem  5.3.1] 
(projection  algorithm)  and  [1,  Theorem  2.4.1  and  2.4.2]  (a  general 
unconstrained  Robb  ins -Monro  algorithm),  and  in  order  to  avoid 
duplication  we  will  merely  put  the  pieces  together. 

By  (Al)  -  (A3),  lxn+i'xnl  ^  0  w.p.l  and  there  is  a  sequence 
of  positive  real  numbers  (Yn>,  Yn  -*•  0  as  n  -*■  °°,  such  that 
lxn+i"Xn^  —  Yn^  except  for  a  finite  number  of  terms,  w.p.l. 

Let  In  =  indicator  of  the  set  (|Xn+1-Xn|  <  Yn/2).  Define 


i  i  ^ 
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Then 


vY  =  X  ■  a  h,(X  )I  .  v  =  X  •  a  h,(X  .{)  s  X  ilf 
n  n  u  1  n’^n'  n’  n  n  n  V  n’Sr  n+1* 


Tn  1  lnG(vi!>  •  vJl.  ♦„  s  KV  *  l’G(v„>  •  X„H1-In) 


(5.:) 


(  .  ,  =  X  -ah,(X,£)+T  ♦  rfi  . 

n*l  n  n  1'  n,snJ  n  vn 


Following  [1,  Theorem  5.3.1],  define  t^(-)  and  $^(*),  resp., 
to  he  the  piecewise  1  incar  functions  on  (0,°°)  with  values 


ii  i  ii-i 

l  a • t •  and  [  a.*.,  resp. ,  at  time  t  .  Let  rn(-)  ■ 

i=0  11  i=0  11  n 

T°(tn>0  -  T°(tn),  *n(0  4» 0 ( t „ +  •)  -  <>0(tn).  Let  X°(*)  and 

denote  the  piecewise  constant  functions  on  [0,°°)  which 
are  equal  to  XR  and  £n,  resp.,  on  ttn»tn+i^»  where 

rn  =  Jp  ai-  Define  »J(t)  -  j*  hi(3f°(tn+s),  f°(tn-s))ds,  and 

let  X^(t)  denote  the  piecewise  1  inear  function  with  value  Xn 
at  tn  and  set  Xn(*)  *  Xn(tn  +  *)*  Then  ( X n (. 0 'l  =  Xn) 

Xn(t)  -  Xn(0)  +  Mj(t)  +  *n(t)  +  xn(t). 


Fix  u  not  in  one  of  the  except ional  sets  of  zero  probability 
of  (Al)  -  (AX).  Then  as  in  [1,  Theorem  5.3.1],  <t>n(0  -*•  0 

uniformly  on  bounded  intervals  as  n  -*■  ».  Similarly  (following 


. 
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the  same  proof)  if  {li”  ( • )  >  were  equicontinuous, 

then  so  would  { rn ( * )  )  and  (Xn(*))  be. 

Suppose  for  the  moment  that  J  is  equi¬ 

continuous.  Then  we  can  select  a  subsequence  (also  indexed  by  n) 
such  that  all  Xn(-),  Xn(0),  Ilj(-),  Tn(-)  converge 

uniformly  on  bounded  intervals,  with  limits  denoted  by  X(*)> 

X(0) ,  Hj(.) ,  0,  t  (  • )  .  Then 

(3.3)  X(t)  =  X(0)  +  f*  M1(s)ds  ♦  x(t). 

J0 

* 

We  need  only  prove  the  equicont inuity  of  ( ( * ) ^  and  characterize 
Hj(-)-  This  is  done  in  the  next  two  paragraphs.  It  will  turn  out 
that  Hj (s)  =  -RX(s)  +  S.  But,  then  the  proof  of  [1,  Theorem 
5.3.1]  implies  that 

X  =  tt C - RX  +  S)  =  rr,C-R(X-0))  . 

Define  f(x)  =  (x-0)'(x-0).  Then  f(x)  -  (x-0)  '?('-R(x-0) ) ,  and 
[1,  Theorem  5.3.11  implies  our  theorem  because  the  only  points  where 
ffx)  =  0  are  x  =  0  and  x  =  irG(0)  (which  equals  0  when 
0  G  G) .  The  uniqueness  of  the  solution  to  the  limiting  differential 
equation  (for  each  X(0)),and  the  uniqueness  of  its  limit  point, 
imply  that  the  particular  fixed  u>  and  the  chosen  subsequence  ate 


irrelevant . 


#  • 


Equicont inuity  of  for  the  fixed  w .  The  equi- 

continuity  and  the  characterization  of  llj(-)  follow  by  use  of 
the  method  of  [1,  Theorems  2.4.1  and  2.4.2).  If  (f,  }  is  a 
bounded  sequence,  then  is  obviously  equicont inuous 

since  hj  (X°(u)  , £°(u) )  is  uniformly  bounded.  Otherwise,  use 
the  method  of  [1,  Theorem  2.4.2],  whose  conditions  are  implied 
by  (Al)  -  (AS).  We  can  write 


X  .  =  X  -  a  (<|>  *  -R)X  ♦  a  (<J>  d  +  p  -S) 
n  + 1  n  n v  n  n  n  n v  n  n  n  n  ’ 


-  a  (RX  -S)  ♦  <*>  ♦  t  , 

n  n  ’  n  n 


Uniform  continuity  on  (0,°°)  of  the  piecewise  linear  function 
n  - 1 

with  value  \  a.f^.d.+p.i^.-S)  at  t  follows  from  (A2,3) 

i =0  11111  n 

for  our  fixed  w.  (See  the  second  sentencebclow  (A3).) 

We  only  need  show  uniform  cont inuity  on  JO,00)  of  the 

n-1  , 

piecewise  linear  function  with  value  £  a  .  |  ^  •  -R I  at  t  . 

i-0  1  1  1  n 

Once  this  is  done,  ll^(-),  the  shifted  sequence .will  obviously 
be  equicont inuous  (the  Xn  coefficient, 

being  bounded,  is  not  important).  Let  4’n  =  (^no,-’‘r^n  r-1^'’ 

It  is  sufficient  to  show  the  uniform  continuity  for  the  linear 
+  t"’1  2  1 

interpolation  ofj  l  a j ^ j f°r  eac^  i.  But  this  follows  from 

j  =0  n-1  , 

(Al),  which  implies  that  \  a.iT  increases 

V  j =o  j  n 

asymptotically  as  i  ajR^  =  tnR^,  since  (see  note  below  (A3)) 

•  j  =0 

m(t+s)  2 

I  1  a  (tf*  i-  R;  i )  I  -*■  0  uniformly  on  bounded  s-intervals  as 
i=m(t)  n  ni  11 


-  n-i  2 

i.e.,  for  the  piecewise  linear  function  with  valuj  .Mj  'J’ji  at  t 
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Character  izat  ion  of  the  limit  llj(-).  Let  n  still  index 
the  convergent  subsequence.  l-'or  our  fixed  u,  we  need  only 
show  that 


n>(*n  +  t) 

(3.41  |  i  a  .  ( ij>. -R)  X  .  |  -*•  0  uniformly  on  bounded 

j=m(tnl  ■>  ->  J  J  ' 

t  -  intervals  as  n  -*■  °°  , 


since  it  follows  from  this  and  (A2,3)  that  Hn(‘)  ■+  -RX(.)  +  s, 
as  desired. 

(3. 4  lean  readily  be  done  by  following  almost  word  for  worfl.  the 

method  of  proof  of  a  similar  result  in  [1,  Theorem  2.4.1],  with 

the  identifications  6  (  |  X  -  X  *  1 1  =  const  ..|X-X  ’  |  g,  =  constant  and 
r- 1 

g7(f, .)  =  £  .  Then  (All  implies  (3.41.  The  argument  in  the 

£  -  i  =  0  11 

reference  shows  that  the  (X  }  fluctuations  are ’slow  enough’ in 

» 

comparison  with  those  of  the  (li'j'l'j  -  R)  for  the  limit  of 

(3.41  to  be  the  same  as  it  would  be  if  Xn  were  a  constant. 

Q .  E .  D . 


4.  Constant  a  =  6  >  0. 

- n 

Theorems  4.3.1  and  6.2.3  present  the  analogs  of  Theorems 
2.4.1  and  5.3.1,  resp.,  under  weaker  conditions  than  (All  *  (A3) 

But  where  the  convergence  is  in  a  weak  sense.  Instead  of  adapting 
them  to  the  current  projection  problem,  we  merely  formulate  their 
use  in  another  related  and  very  important  problem:  algorithm  (3.1) 
where  an  =  6  >  0.  Indeed,  many,  if  not  most,  uses  of  (3.1)  use 
constant  coefficients  an  s  8,  (at  least  for  the  'tails’  -  once 
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the  'transient'  period  is  over).  For  small  B,  it  might  take 

quite  a  while  for  (transient  period)  Xn  to  get  near  its  limit 

(0  or  the  projection  of  6  on  G) .  We  are  concerned  (more  or 

less)  with  what  occurs  after  this  transient  period.  Let  X® 

n 

denote  the  solution  to  (3.1)  and  define  X^(*)  =  piecewise  linear 
interpolation  (intervals  3)  of  (X^),  Let  he  a  sequence 

of  integers  (roughly  defining  the  transient  period,  perhaps),  such 
that  Np  <*>  and  N^B  as  R  +  0. 

A4 .  Let^  £n  represent  (’J>nt]!1-R)  or  ij>ndn  -  S  or  *  ^nPn- 
Here  m ( t )  =  [t/3]  the  integral  part  of  t / B .  Assume 

m(t+s) 

slip  P(max  |  l  ^  e  >  0}  =  0,  any  T  <  00 , 

t  >«>  s<T  i=m(t) 

each  e  >  0. 

This  condition  is  discussed  after  the  Theorem. 


f  j  1  O 

Theorem  2.  Assume  that  E|^n^nl  »  ^l^npn^  and  ^ I ^n^n ^  are 
uniformly  bounded ,  and  assume  (A4).  Then  {X^(Ng3'+’)}  converges 
weakly  (in  the  function  space  Cr[0,°°))  t£  the  constant  function 


7t £, ( 0 )  a_s  3  ■*  0 ,  where  71^(0)  =  nearest  point  to  0  on 

In  particular  X^  -*•  tig(9)  i_n  probability  as  3  -*■  0 , 

.  3 

k  >  0»  and  more  strongly 


G. 

each 


lim  P(  sup  |X®  q+v'VC0)  I  >  e  >  0}  =  0  for  each  t 
3-0  k  <t/3  3S 


and  e  >  0 . 
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Thus  for  small  P  and  large  n,  {Xn>  'hovers'  around  Tt^.(O) 
as  desired. 

The  proof  will  not  be  given  since  it  follows  the  general  lines 
of  the  appropriate  parts  of  |1,  Theorems  4.3.1  and  6.2.3],  which, 
in  turn,  are  just  weak  convergence  analogs  of  the  theorems  upon 
which  Theorem  1  is  based.  To  adapt  the  proofs  of  |1]  to  the  present 

case,  merely  replace  the  shifted  sequences  Xn ( * ) ,  etc.  of  (1]  by 

R  2 

X  (NpP+*)»  etc.  If  hKjl  is  bounded  as  assumed  above  then,  by 
| 1 ,  Theorem  4.1.1]  (A4)  holds  if  there  are  R(i)  such  that 
|  lif.j f, j |  _<  R(i)  -*■  0  as  i  -*■  m,  a  very  weak  condition  indeed. 

Conclusions.  The  projected  iterate  sequence  converge  w.p.l  to 
the  closest  point  on  G  to  the  optimum  (converges  in  prob  ab  ility 
in  the  weak  convergence  case). 
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